[SPARK-31190][SQL] ScalaReflection should not erasure user defined AnyVal type#27959
[SPARK-31190][SQL] ScalaReflection should not erasure user defined AnyVal type#27959Ngone51 wants to merge 3 commits intoapache:masterfrom
Conversation
| if (isSubtype(tpe, localTypeOf[AnyVal]) && !tpe.toString.startsWith("scala")) { | ||
| tpe | ||
| } else { | ||
| tpe.erasure |
There was a problem hiding this comment.
Basically, it convert a Scala type to Java type. e.g. scala.Any -> java.lang.Object.
|
Test build #120053 has finished for PR 27959 at commit
|
| * or nested classes in package objects, it uses the dollar sign ($) to create | ||
| * synthetic classes, emulating behaviour in Java bytecode. | ||
| */ | ||
| def getClassNameFromType(tpe: `Type`): String = { |
There was a problem hiding this comment.
do we need to quote Type?
There was a problem hiding this comment.
I don't know it clearly...just copied from the original one.
| * synthetic classes, emulating behaviour in Java bytecode. | ||
| */ | ||
| def getClassNameFromType(tpe: `Type`): String = { | ||
| tpe.dealias.erasure.typeSymbol.asClass.fullName |
There was a problem hiding this comment.
Why it's a problem only for AnyVal? Here we blindly erasure everything.
There was a problem hiding this comment.
For a AnVal class, e.g. case class Foo(i: Int) extends AnyVal, erasure on it will return type of Int instead of Foo. So, this actually cause a problem when we add the class name to the walkedTypePath.
| } | ||
|
|
||
| private def erasure(tpe: Type): Type = { | ||
| if (isSubtype(tpe, localTypeOf[AnyVal]) && !tpe.toString.startsWith("scala")) { |
There was a problem hiding this comment.
can we add some comments to explain why we special case AnyVal?
|
Test build #120097 has finished for PR 27959 at commit
|
|
Jenkins, retest this please. |
|
Test build #120172 has finished for PR 27959 at commit
|
|
thanks, merging to master/3.0! |
…yVal type ### What changes were proposed in this pull request? Improve `ScalaReflection` to only don't erasure non user defined `AnyVal` type, but still erasure other types, e.g. `Any`. And this brings two benefits: 1. Give better encode error message for some unsupported types, e.g. `Any` 2. Won't miss the walk path for the `AnyVal` type ### Why are the changes needed? Firstly, PR #15284 added encode(serializeFor/deserializeFor) support for value class, which extends `AnyVal`, by not erasure types. But, this also introduce a problem that when user try to encoder unsupported types, e.g. `Any`, it will fail on `java.lang.ClassNotFoundException: scala.Any` due to the reason that `scala.Any` doesn't erasure to `java.lang.Object`. Also, in current `getClassNameFromType()`, it always erasure types which could missing walked path for user defined `AnyVal` types. ### Does this PR introduce any user-facing change? Yes. For the test below: ``` case class Bar(i: Any) case class Foo(i: Bar) extends AnyVal test() { implicitly[ExpressionEncoder[Foo]] } ``` Before: ``` java.lang.ClassNotFoundException: scala.Any at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355) ... ```` After: ``` java.lang.UnsupportedOperationException: No Encoder found for Any - field (class: "java.lang.Object", name: "i") - field (class: "org.apache.spark.sql.catalyst.encoders.Bar", name: "i") - root class: "org.apache.spark.sql.catalyst.encoders.Foo" at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$serializerFor$1(ScalaReflection.scala:561) ``` ### How was this patch tested? Added unit test and test manually. Closes #27959 from Ngone51/impr_anyval. Authored-by: yi.wu <yi.wu@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 5c4d44b) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
|
thanks all! |
…yVal type ### What changes were proposed in this pull request? Improve `ScalaReflection` to only don't erasure non user defined `AnyVal` type, but still erasure other types, e.g. `Any`. And this brings two benefits: 1. Give better encode error message for some unsupported types, e.g. `Any` 2. Won't miss the walk path for the `AnyVal` type ### Why are the changes needed? Firstly, PR apache#15284 added encode(serializeFor/deserializeFor) support for value class, which extends `AnyVal`, by not erasure types. But, this also introduce a problem that when user try to encoder unsupported types, e.g. `Any`, it will fail on `java.lang.ClassNotFoundException: scala.Any` due to the reason that `scala.Any` doesn't erasure to `java.lang.Object`. Also, in current `getClassNameFromType()`, it always erasure types which could missing walked path for user defined `AnyVal` types. ### Does this PR introduce any user-facing change? Yes. For the test below: ``` case class Bar(i: Any) case class Foo(i: Bar) extends AnyVal test() { implicitly[ExpressionEncoder[Foo]] } ``` Before: ``` java.lang.ClassNotFoundException: scala.Any at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355) ... ```` After: ``` java.lang.UnsupportedOperationException: No Encoder found for Any - field (class: "java.lang.Object", name: "i") - field (class: "org.apache.spark.sql.catalyst.encoders.Bar", name: "i") - root class: "org.apache.spark.sql.catalyst.encoders.Foo" at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$serializerFor$1(ScalaReflection.scala:561) ``` ### How was this patch tested? Added unit test and test manually. Closes apache#27959 from Ngone51/impr_anyval. Authored-by: yi.wu <yi.wu@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
Improve
ScalaReflectionto only don't erasure non user definedAnyValtype, but still erasure other types, e.g.Any. And this brings two benefits:Give better encode error message for some unsupported types, e.g.
AnyWon't miss the walk path for the
AnyValtypeWhy are the changes needed?
Firstly, PR #15284 added encode(serializeFor/deserializeFor) support for value class, which extends
AnyVal, by not erasure types. But, this also introduce a problem that when user try to encoder unsupported types, e.g.Any, it will fail onjava.lang.ClassNotFoundException: scala.Anydue to the reason thatscala.Anydoesn't erasure tojava.lang.Object.Also, in current
getClassNameFromType(), it always erasure types which could missing walked path for user definedAnyValtypes.Does this PR introduce any user-facing change?
Yes. For the test below:
Before:
After:
How was this patch tested?
Added unit test and test manually.